AITopics | uncertainty parameter

Details

Neural Information Processing SystemsApr-25-2026, 07:23:57 GMT

The training is stalled if the size of the replay buffer is smaller than the minibatch size, i.e., if |B|< M. Algorithms 3 and 4 show the critic network update and the actor network and uncertainty parameter sampler update, respectively. Although we write the gradient-based update in the form of a mini-batch stochastic gradient update for simplicity, we employ an adaptive approach such as Adam [16]. The update of pk follows the exponential moving average with the momentum (1/Tlast), where Tlast is the number of steps spent in the last episode (Tlast is set to 1000 for the first episode). The reason behind this design choice is as follows. The short episode is a meaning that a bad uncertainty parameter ω is used in the last episode.

artificial intelligence, machine learning, worst-case performance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

2e0f5561c1553a97cee5fa64575358c9-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:23:54 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

2e0f5561c1553a97cee5fa64575358c9-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 03:17:03 GMT

m2td3, uncertainty parameter, worst-case performance, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

2e0f5561c1553a97cee5fa64575358c9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 03:17:00 GMT

international conference, uncertainty parameter, worst-case performance, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
(22 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Neural Information Processing SystemsDec-23-2025, 23:32:10 GMT

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment.However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment.In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment.To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach.Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.

max-min off-policy actor-critic method, real-world environment, worst-case robustness, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels

Natalia Neverova, David Novotny, Andrea Vedaldi

Neural Information Processing SystemsOct-2-2025, 18:03:21 GMT

Neural Information Processing Systems http://nips.cc/

annotation, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Neural Information Processing SystemsOct-10-2024, 12:11:56 GMT

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment.However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment.In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment.To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach.Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.

max-min off-policy actor-critic method, model misspecification, real-world environment, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Learning Uncertainty-Aware Temporally-Extended Actions

Lee, Joongkyu, Park, Seung Joon, Tang, Yunhao, Oh, Min-hwan

arXiv.org Artificial IntelligenceFeb-8-2024

In reinforcement learning, temporal abstraction in the action space, exemplified by action repetition, is a technique to facilitate policy learning through extended actions. However, a primary limitation in previous studies of action repetition is its potential to degrade performance, particularly when sub-optimal actions are repeated. This issue often negates the advantages of action repetition. To address this, we propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE). UTE employs ensemble methods to accurately measure uncertainty during action extension. This feature allows policies to strategically choose between emphasizing exploration or adopting an uncertainty-averse approach, tailored to their specific needs. We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments. Our findings show that UTE outperforms existing action repetition algorithms, effectively mitigating their inherent limitations and significantly enhancing policy learning efficiency.

agent, algorithm, extension length, (17 more...)

arXiv.org Artificial Intelligence

2402.05439

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Tanabe, Takumi, Sato, Rei, Fukuchi, Kazuto, Sakuma, Jun, Akimoto, Youhei

arXiv.org Artificial IntelligenceJan-11-2023

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2211.03413

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
(22 more...)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment (0.74)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(2 more...)

Add feedback

Efficient Characterization of Dynamic Response Variation Using Multi-Fidelity Data Fusion through Composite Neural Network

Zhou, Kai, Tang, Jiong

arXiv.org Machine LearningMay-6-2020

Uncertainties in a structure is inevitable, which generally lead to variation in dynamic response predictions. For a complex structure, brute force Monte Carlo simulation for response variation analysis is infeasible since one single run may already be computationally costly. Data driven meta-modeling approaches have thus been explored to facilitate efficient emulation and statistical inference. The performance of a meta-model hinges upon both the quality and quantity of training dataset. In actual practice, however, high-fidelity data acquired from high-dimensional finite element simulation or experiment are generally scarce, which poses significant challenge to meta-model establishment. In this research, we take advantage of the multi-level response prediction opportunity in structural dynamic analysis, i.e., acquiring rapidly a large amount of low-fidelity data from reduced-order modeling, and acquiring accurately a small amount of high-fidelity data from full-scale finite element analysis. Specifically, we formulate a composite neural network fusion approach that can fully utilize the multi-level, heterogeneous datasets obtained. It implicitly identifies the correlation of the low- and high-fidelity datasets, which yields improved accuracy when compared with the state-of-the-art. Comprehensive investigations using frequency response variation characterization as case example are carried out to demonstrate the performance.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2005.03213

Country: North America > United States > Connecticut > Tolland County > Storrs (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Filters

Collaborating Authors

uncertainty parameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Details

2e0f5561c1553a97cee5fa64575358c9-Paper-Conference.pdf

2e0f5561c1553a97cee5fa64575358c9-Supplemental-Conference.pdf

2e0f5561c1553a97cee5fa64575358c9-Paper-Conference.pdf

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Learning Uncertainty-Aware Temporally-Extended Actions

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Efficient Characterization of Dynamic Response Variation Using Multi-Fidelity Data Fusion through Composite Neural Network